## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## status
## major 3
## minor 4.3
## year 2017
## month 11
## day 30
## svn rev 73796
## language R
## version.string R version 3.4.3 (2017-11-30)
## nickname Kite-Eating Tree
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/cleaned_hm.csv'
hm_data <- read_csv(urlfile)
## Warning: NAs introduced by coercion
HappyDB is a corpus of 100,000 crowd-sourced happy moments via Amazon’s Mechanical Turk. You can read more about it on https://arxiv.org/abs/1801.07746
For this analysis, I wanted to focus on a personal curiosity - for peers within my age group (26-30), what brings them happiness? There have been myths that claim that girls mature or “reach adulthood” a few years earlier than boys.
Can we make some inferences from happy moments of millenials? How do males and females differ in this regard?
## wid original_hm gender marital parenthood
## Min. : 1 Length:94340 f:38650 married:40890 n:58715
## 1st Qu.: 403 Class :character m:55690 single :53450 y:35625
## Median : 1099 Mode :character
## Mean : 2682
## 3rd Qu.: 3319
## Max. :13839
##
## reflection_period age country
## hours_24:46821 Min. :17.00 USA :73328
## months_3:47519 1st Qu.:25.00 IND :16551
## Median :30.00 VEN : 546
## Mean :31.86 CAN : 531
## 3rd Qu.:35.00 GBR : 352
## Max. :95.00 (Other): 2885
## NA's : 147
## ground_truth_category predicted_category text
## affection : 4536 achievement :31865 Length:94340
## achievement : 4038 affection :31969 Class :character
## bonding : 1662 bonding :10116 Mode :character
## enjoy_the_moment: 1425 enjoy_the_moment:10467
## leisure : 1254 exercise : 1146
## (Other) : 434 leisure : 7071
## NA's :80991 nature : 1706
## count peer_agegroup
## Min. : 1.000 non-peer:65663
## 1st Qu.: 3.000 peer :28677
## Median : 5.000
## Mean : 6.168
## 3rd Qu.: 7.000
## Max. :509.000
##
## wid original_hm gender marital parenthood
## Min. : 2 Length:28677 f:10793 married:11092 n:19718
## 1st Qu.: 354 Class :character m:17884 single :17585 y: 8959
## Median : 925 Mode :character
## Mean : 2438
## 3rd Qu.: 2788
## Max. :13828
##
## reflection_period age country
## hours_24:14074 Min. :26.00 USA :20994
## months_3:14603 1st Qu.:27.00 IND : 6378
## Median :28.00 CAN : 156
## Mean :27.98 PHL : 129
## 3rd Qu.:29.00 VEN : 102
## Max. :30.00 (Other): 894
## NA's : 24
## ground_truth_category predicted_category text
## affection : 1406 achievement :9627 Length:28677
## achievement : 1208 affection :9232 Class :character
## leisure : 552 bonding :3200 Mode :character
## bonding : 530 enjoy_the_moment:3169
## enjoy_the_moment: 476 exercise : 411
## (Other) : 127 leisure :2610
## NA's :24378 nature : 428
## count peer_agegroup
## Min. : 1.000 non-peer: 0
## 1st Qu.: 3.000 peer :28677
## Median : 5.000
## Mean : 6.357
## 3rd Qu.: 7.000
## Max. :509.000
##
What words occur most frequently?
Friends, day, and time all feature heavily for both sexes, but females have “husband” as the #4 word, while for males it’s “played”; Wife does not appear until #10
## [1] "numeric"
## [1] "tbl_df" "tbl" "data.frame"
## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf
What if we look at multiple words or words that occur together? (bigrams)
The #1 bi-gram for men aged 26-30 is… video games! Additionally, “played video” and “played games” appears as well.
## Warning in mutate_impl(.data, dots): Unequal factor levels: coercing to
## character
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Warning in mutate_impl(.data, dots): binding character and factor vector,
## coercing into character vector
## Selecting by tf_idf
Summary: 1. While playing video games appears prominently among the happy moments of men aged 26-30, they draw similar happiness from promotions/achievements and moments of affection (dating their girlfriends or their wives giving birth) 2. For women, husbands, sons, and daughters come up more than I initially expected, whereas the word “boyfriend” occurs far less frequently